Overview

Dataset statistics

Number of variables25
Number of observations3392
Missing cells13
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory665.8 KiB
Average record size in memory201.0 B

Variable types

Numeric8
Categorical15
Boolean1
DateTime1

Alerts

p_throws has constant value "L" Constant
pitcher_id has constant value "477132" Constant
ab_id is highly correlated with g_idHigh correlation
g_id is highly correlated with ab_idHigh correlation
inning is highly correlated with p_score and 1 other fieldsHigh correlation
o is highly correlated with outsHigh correlation
p_score is highly correlated with inning and 1 other fieldsHigh correlation
event_num is highly correlated with inning and 1 other fieldsHigh correlation
b_count is highly correlated with pitch_numHigh correlation
s_count is highly correlated with pitch_numHigh correlation
outs is highly correlated with oHigh correlation
pitch_num is highly correlated with b_count and 1 other fieldsHigh correlation
ab_id is highly correlated with g_idHigh correlation
g_id is highly correlated with ab_idHigh correlation
inning is highly correlated with p_score and 1 other fieldsHigh correlation
o is highly correlated with outsHigh correlation
p_score is highly correlated with inning and 1 other fieldsHigh correlation
event_num is highly correlated with inning and 1 other fieldsHigh correlation
b_count is highly correlated with pitch_numHigh correlation
s_count is highly correlated with pitch_numHigh correlation
outs is highly correlated with oHigh correlation
pitch_num is highly correlated with b_count and 1 other fieldsHigh correlation
ab_id is highly correlated with g_idHigh correlation
g_id is highly correlated with ab_idHigh correlation
inning is highly correlated with event_numHigh correlation
o is highly correlated with outsHigh correlation
p_score is highly correlated with event_numHigh correlation
event_num is highly correlated with inning and 1 other fieldsHigh correlation
b_count is highly correlated with pitch_numHigh correlation
s_count is highly correlated with pitch_numHigh correlation
outs is highly correlated with oHigh correlation
pitch_num is highly correlated with b_count and 1 other fieldsHigh correlation
code is highly correlated with type and 2 other fieldsHigh correlation
type is highly correlated with code and 2 other fieldsHigh correlation
b_count is highly correlated with pitcher_id and 1 other fieldsHigh correlation
s_count is highly correlated with pitcher_id and 1 other fieldsHigh correlation
on_3b is highly correlated with pitcher_id and 1 other fieldsHigh correlation
o is highly correlated with pitcher_id and 2 other fieldsHigh correlation
pcodes is highly correlated with pitch_type and 2 other fieldsHigh correlation
on_1b is highly correlated with pitcher_id and 1 other fieldsHigh correlation
event is highly correlated with pitcher_id and 1 other fieldsHigh correlation
on_2b is highly correlated with pitcher_id and 1 other fieldsHigh correlation
pitch_type is highly correlated with pcodes and 2 other fieldsHigh correlation
pitcher_id is highly correlated with code and 14 other fieldsHigh correlation
top is highly correlated with pitcher_id and 1 other fieldsHigh correlation
outs is highly correlated with o and 2 other fieldsHigh correlation
p_throws is highly correlated with code and 14 other fieldsHigh correlation
stand is highly correlated with pitcher_id and 1 other fieldsHigh correlation
ab_id is highly correlated with g_id and 3 other fieldsHigh correlation
batter_id is highly correlated with dateHigh correlation
event is highly correlated with o and 4 other fieldsHigh correlation
g_id is highly correlated with ab_id and 3 other fieldsHigh correlation
inning is highly correlated with p_score and 1 other fieldsHigh correlation
o is highly correlated with event and 1 other fieldsHigh correlation
p_score is highly correlated with ab_id and 4 other fieldsHigh correlation
top is highly correlated with dateHigh correlation
date is highly correlated with ab_id and 6 other fieldsHigh correlation
code is highly correlated with event and 2 other fieldsHigh correlation
type is highly correlated with event and 1 other fieldsHigh correlation
pitch_type is highly correlated with event and 3 other fieldsHigh correlation
event_num is highly correlated with inning and 1 other fieldsHigh correlation
b_score is highly correlated with ab_id and 2 other fieldsHigh correlation
b_count is highly correlated with pitch_numHigh correlation
s_count is highly correlated with pitch_type and 1 other fieldsHigh correlation
outs is highly correlated with oHigh correlation
pitch_num is highly correlated with b_count and 1 other fieldsHigh correlation
pcodes is highly correlated with pitch_typeHigh correlation
p_score has 1318 (38.9%) zeros Zeros
b_score has 1862 (54.9%) zeros Zeros

Reproduction

Analysis started2021-11-06 22:51:04.540975
Analysis finished2021-11-06 22:51:13.886532
Duration9.35 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

ab_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct897
Distinct (%)26.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015091774
Minimum2015000768
Maximum2015183800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:13.952533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2015000768
5-th percentile2015005032
Q12015045807
median2015090365
Q32015139176
95-th percentile2015172599
Maximum2015183800
Range183032
Interquartile range (IQR)93369

Descriptive statistics

Standard deviation53785.52681
Coefficient of variation (CV)2.669135347 × 10-5
Kurtosis-1.195201581
Mean2015091774
Median Absolute Deviation (MAD)48781
Skewness-0.02087817978
Sum6.835191297 × 1012
Variance2892882894
MonotonicityIncreasing
2021-11-06T17:51:14.036533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201502270810
 
0.3%
201516219310
 
0.3%
201518379910
 
0.3%
20151725729
 
0.3%
20150283229
 
0.3%
20151680799
 
0.3%
20151160769
 
0.3%
20150572619
 
0.3%
20150111439
 
0.3%
20150162629
 
0.3%
Other values (887)3299
97.3%
ValueCountFrequency (%)
20150007683
0.1%
20150007695
0.1%
20150007705
0.1%
20150007713
0.1%
20150007723
0.1%
20150007771
 
< 0.1%
20150007781
 
< 0.1%
20150007795
0.1%
20150007833
0.1%
20150007842
 
0.1%
ValueCountFrequency (%)
20151838005
0.1%
201518379910
0.3%
20151837988
0.2%
20151837933
 
0.1%
20151837923
 
0.1%
20151837913
 
0.1%
20151837855
0.1%
20151837845
0.1%
20151837834
 
0.1%
20151837782
 
0.1%

batter_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct221
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean496396.7556
Minimum112526
Maximum630111
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:14.125533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum112526
5-th percentile424325
Q1453211
median493114
Q3543939
95-th percentile607054
Maximum630111
Range517585
Interquartile range (IQR)90728

Descriptive statistics

Standard deviation75201.10871
Coefficient of variation (CV)0.1514939569
Kurtosis6.323821593
Mean496396.7556
Median Absolute Deviation (MAD)46633
Skewness-1.408403215
Sum1683777795
Variance5655206752
MonotonicityNot monotonic
2021-11-06T17:51:14.211533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
51893469
 
2.0%
43463663
 
1.9%
45776359
 
1.7%
50164759
 
1.7%
45356858
 
1.7%
57144855
 
1.6%
62211054
 
1.6%
49311451
 
1.5%
47483248
 
1.4%
46002646
 
1.4%
Other values (211)2830
83.4%
ValueCountFrequency (%)
1125265
 
0.1%
13338033
1.0%
15002913
 
0.4%
1502123
 
0.1%
3467982
 
0.1%
40008512
 
0.4%
40539531
0.9%
40778131
0.9%
40781211
 
0.3%
4078228
 
0.2%
ValueCountFrequency (%)
6301116
 
0.2%
62835615
 
0.4%
6283336
 
0.2%
6231436
 
0.2%
62211054
1.6%
62104310
 
0.3%
6087008
 
0.2%
6086713
 
0.1%
6085968
 
0.2%
60836523
0.7%

event
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct22
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
Strikeout
1390 
Groundout
583 
Single
373 
Walk
236 
Flyout
200 
Other values (17)
610 

Length

Max length19
Median length9
Mean length7.958726415
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowHit By Pitch
2nd rowHit By Pitch
3rd rowHit By Pitch
4th rowStrikeout
5th rowStrikeout

Common Values

ValueCountFrequency (%)
Strikeout1390
41.0%
Groundout583
17.2%
Single373
 
11.0%
Walk236
 
7.0%
Flyout200
 
5.9%
Lineout175
 
5.2%
Pop Out117
 
3.4%
Double90
 
2.7%
Forceout65
 
1.9%
Home Run53
 
1.6%
Other values (12)110
 
3.2%

Length

2021-11-06T17:51:14.301533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
strikeout1398
37.6%
groundout595
16.0%
single373
 
10.0%
walk240
 
6.4%
flyout200
 
5.4%
lineout175
 
4.7%
out138
 
3.7%
pop121
 
3.3%
double92
 
2.5%
forceout65
 
1.7%
Other values (19)324
 
8.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

g_id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct33
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201501214.7
Minimum201500012
Maximum201502425
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:14.376546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum201500012
5-th percentile201500067
Q1201500606
median201501198
Q3201501842
95-th percentile201502276
Maximum201502425
Range2413
Interquartile range (IQR)1236

Descriptive statistics

Standard deviation710.1755483
Coefficient of variation (CV)3.524423162 × 10-6
Kurtosis-1.199796231
Mean201501214.7
Median Absolute Deviation (MAD)644
Skewness-0.03006276692
Sum6.834921203 × 1011
Variance504349.3095
MonotonicityIncreasing
2021-11-06T17:51:14.451546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
201501987132
 
3.9%
201501271123
 
3.6%
201500907117
 
3.4%
201501772116
 
3.4%
201501540115
 
3.4%
201501842111
 
3.3%
201500525110
 
3.2%
201500460110
 
3.2%
201501906108
 
3.2%
201500987107
 
3.2%
Other values (23)2243
66.1%
ValueCountFrequency (%)
20150001299
2.9%
20150006799
2.9%
201500147104
3.1%
20150021693
2.7%
20150030192
2.7%
20150037491
2.7%
201500460110
3.2%
201500525110
3.2%
20150060699
2.9%
201500675101
3.0%
ValueCountFrequency (%)
20150242560
1.8%
201502349104
3.1%
20150227680
2.4%
201502217100
2.9%
201502142105
3.1%
201502063106
3.1%
201501987132
3.9%
201501906108
3.2%
201501842111
3.3%
201501772116
3.4%

inning
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.198113208
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:14.518547image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.176038238
Coefficient of variation (CV)0.5183371983
Kurtosis-0.9660017255
Mean4.198113208
Median Absolute Deviation (MAD)2
Skewness0.1527614284
Sum14240
Variance4.735142414
MonotonicityNot monotonic
2021-11-06T17:51:14.576546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
4559
16.5%
1482
14.2%
6459
13.5%
2437
12.9%
5434
12.8%
3424
12.5%
7387
11.4%
8147
 
4.3%
963
 
1.9%
ValueCountFrequency (%)
1482
14.2%
2437
12.9%
3424
12.5%
4559
16.5%
5434
12.8%
6459
13.5%
7387
11.4%
8147
 
4.3%
963
 
1.9%
ValueCountFrequency (%)
963
 
1.9%
8147
 
4.3%
7387
11.4%
6459
13.5%
5434
12.8%
4559
16.5%
3424
12.5%
2437
12.9%
1482
14.2%

o
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
1
1139 
2
1104 
3
859 
0
290 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row1

Common Values

ValueCountFrequency (%)
11139
33.6%
21104
32.5%
3859
25.3%
0290
 
8.5%

Length

2021-11-06T17:51:14.650546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:14.697546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
11139
33.6%
21104
32.5%
3859
25.3%
0290
 
8.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

p_score
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.521521226
Minimum0
Maximum8
Zeros1318
Zeros (%)38.9%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:14.753546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile6
Maximum8
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.906578729
Coefficient of variation (CV)1.253074026
Kurtosis2.202313762
Mean1.521521226
Median Absolute Deviation (MAD)1
Skewness1.613558561
Sum5161
Variance3.635042451
MonotonicityNot monotonic
2021-11-06T17:51:14.814546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
01318
38.9%
1848
25.0%
2569
16.8%
3178
 
5.2%
5149
 
4.4%
4128
 
3.8%
6116
 
3.4%
886
 
2.5%
ValueCountFrequency (%)
01318
38.9%
1848
25.0%
2569
16.8%
3178
 
5.2%
4128
 
3.8%
5149
 
4.4%
6116
 
3.4%
886
 
2.5%
ValueCountFrequency (%)
886
 
2.5%
6116
 
3.4%
5149
 
4.4%
4128
 
3.8%
3178
 
5.2%
2569
16.8%
1848
25.0%
01318
38.9%

p_throws
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
L
3392 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowL
2nd rowL
3rd rowL
4th rowL
5th rowL

Common Values

ValueCountFrequency (%)
L3392
100.0%

Length

2021-11-06T17:51:14.888546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:14.932546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
l3392
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pitcher_id
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
477132
3392 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row477132
2nd row477132
3rd row477132
4th row477132
5th row477132

Common Values

ValueCountFrequency (%)
4771323392
100.0%

Length

2021-11-06T17:51:15.098546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:15.144546image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
4771323392
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

stand
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
R
2535 
L
857 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowR
2nd rowR
3rd rowR
4th rowR
5th rowR

Common Values

ValueCountFrequency (%)
R2535
74.7%
L857
 
25.3%

Length

2021-11-06T17:51:15.189052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:15.235052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
r2535
74.7%
l857
 
25.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

top
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size29.8 KiB
True
1745 
False
1647 
ValueCountFrequency (%)
True1745
51.4%
False1647
48.6%
2021-11-06T17:51:15.262053image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

date
Date

HIGH CORRELATION

Distinct33
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
Minimum2015-04-06 00:00:00
Maximum2015-10-04 00:00:00
2021-11-06T17:51:15.312052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:15.382052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=33)

code
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct14
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
B
981 
F
624 
C
558 
S
466 
X
369 
Other values (9)
394 

Length

Max length2
Median length1
Mean length1.029775943
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowF
2nd rowC
3rd rowH
4th rowB
5th rowC

Common Values

ValueCountFrequency (%)
B981
28.9%
F624
18.4%
C558
16.5%
S466
13.7%
X369
 
10.9%
D128
 
3.8%
*B101
 
3.0%
W73
 
2.2%
E45
 
1.3%
T28
 
0.8%
Other values (4)19
 
0.6%

Length

2021-11-06T17:51:15.468573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b1082
31.9%
f624
18.4%
c558
16.5%
s466
13.7%
x369
 
10.9%
d128
 
3.8%
w73
 
2.2%
e45
 
1.3%
t28
 
0.8%
l11
 
0.3%
Other values (3)8
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
S
1761 
B
1089 
X
542 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowS
3rd rowB
4th rowB
5th rowS

Common Values

ValueCountFrequency (%)
S1761
51.9%
B1089
32.1%
X542
 
16.0%

Length

2021-11-06T17:51:15.538573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:15.584573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
s1761
51.9%
b1089
32.1%
x542
 
16.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pitch_type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)0.2%
Missing6
Missing (%)0.2%
Memory size53.0 KiB
FF
1722 
SL
932 
CU
616 
FT
 
107
CH
 
8

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowFF
2nd rowFF
3rd rowFF
4th rowFF
5th rowFF

Common Values

ValueCountFrequency (%)
FF1722
50.8%
SL932
27.5%
CU616
 
18.2%
FT107
 
3.2%
CH8
 
0.2%
IN1
 
< 0.1%
(Missing)6
 
0.2%

Length

2021-11-06T17:51:15.641573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:15.690573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
ff1722
50.9%
sl932
27.5%
cu616
 
18.2%
ft107
 
3.2%
ch8
 
0.2%
in1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

event_num
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct522
Distinct (%)15.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean229.7777123
Minimum3
Maximum560
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:15.768573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile19
Q1107
median223
Q3345
95-th percentile464.45
Maximum560
Range557
Interquartile range (IQR)238

Descriptive statistics

Standard deviation140.8944469
Coefficient of variation (CV)0.6131771683
Kurtosis-1.018578309
Mean229.7777123
Median Absolute Deviation (MAD)119
Skewness0.1726221885
Sum779406
Variance19851.24518
MonotonicityNot monotonic
2021-11-06T17:51:15.842572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
317
 
0.5%
415
 
0.4%
514
 
0.4%
5014
 
0.4%
1114
 
0.4%
5113
 
0.4%
29713
 
0.4%
21813
 
0.4%
16612
 
0.4%
16512
 
0.4%
Other values (512)3255
96.0%
ValueCountFrequency (%)
317
0.5%
415
0.4%
514
0.4%
69
0.3%
79
0.3%
86
 
0.2%
99
0.3%
1010
0.3%
1114
0.4%
1211
0.3%
ValueCountFrequency (%)
5601
< 0.1%
5592
0.1%
5582
0.1%
5571
< 0.1%
5561
< 0.1%
5541
< 0.1%
5531
< 0.1%
5501
< 0.1%
5492
0.1%
5482
0.1%

b_score
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8166273585
Minimum0
Maximum5
Zeros1862
Zeros (%)54.9%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:15.911572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.154778287
Coefficient of variation (CV)1.414082292
Kurtosis2.055413806
Mean0.8166273585
Median Absolute Deviation (MAD)0
Skewness1.574418781
Sum2770
Variance1.333512892
MonotonicityNot monotonic
2021-11-06T17:51:15.973572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
01862
54.9%
1816
24.1%
2382
 
11.3%
3185
 
5.5%
4100
 
2.9%
547
 
1.4%
ValueCountFrequency (%)
01862
54.9%
1816
24.1%
2382
 
11.3%
3185
 
5.5%
4100
 
2.9%
547
 
1.4%
ValueCountFrequency (%)
547
 
1.4%
4100
 
2.9%
3185
 
5.5%
2382
 
11.3%
1816
24.1%
01862
54.9%

b_count
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
0.0
1664 
1.0
1019 
2.0
505 
3.0
204 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.01664
49.1%
1.01019
30.0%
2.0505
 
14.9%
3.0204
 
6.0%

Length

2021-11-06T17:51:16.044572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.090583image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.01664
49.1%
1.01019
30.0%
2.0505
 
14.9%
3.0204
 
6.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

s_count
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
0.0
1286 
2.0
1061 
1.0
1045 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row2.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.01286
37.9%
2.01061
31.3%
1.01045
30.8%

Length

2021-11-06T17:51:16.153572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.199573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.01286
37.9%
2.01061
31.3%
1.01045
30.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

outs
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
0.0
1193 
1.0
1123 
2.0
1076 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.01193
35.2%
1.01123
33.1%
2.01076
31.7%

Length

2021-11-06T17:51:16.256572image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.302578image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.01193
35.2%
1.01123
33.1%
2.01076
31.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pitch_num
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.840801887
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size53.0 KiB
2021-11-06T17:51:16.352576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q34
95-th percentile6
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.681164674
Coefficient of variation (CV)0.5917922971
Kurtosis0.5306867948
Mean2.840801887
Median Absolute Deviation (MAD)1
Skewness0.9040000346
Sum9636
Variance2.826314662
MonotonicityNot monotonic
2021-11-06T17:51:16.404576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1897
26.4%
2775
22.8%
3667
19.7%
4494
14.6%
5310
 
9.1%
6140
 
4.1%
766
 
1.9%
830
 
0.9%
910
 
0.3%
103
 
0.1%
ValueCountFrequency (%)
1897
26.4%
2775
22.8%
3667
19.7%
4494
14.6%
5310
 
9.1%
6140
 
4.1%
766
 
1.9%
830
 
0.9%
910
 
0.3%
103
 
0.1%
ValueCountFrequency (%)
103
 
0.1%
910
 
0.3%
830
 
0.9%
766
 
1.9%
6140
 
4.1%
5310
 
9.1%
4494
14.6%
3667
19.7%
2775
22.8%
1897
26.4%

on_1b
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
0.0
2541 
1.0
851 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.02541
74.9%
1.0851
 
25.1%

Length

2021-11-06T17:51:16.469577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.515576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.02541
74.9%
1.0851
 
25.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

on_2b
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
0.0
2958 
1.0
434 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.02958
87.2%
1.0434
 
12.8%

Length

2021-11-06T17:51:16.565576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.610576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.02958
87.2%
1.0434
 
12.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

on_3b
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.0 KiB
0.0
3175 
1.0
 
217

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.03175
93.6%
1.0217
 
6.4%

Length

2021-11-06T17:51:16.660577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.706576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.03175
93.6%
1.0217
 
6.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pcodes
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing7
Missing (%)0.2%
Memory size53.0 KiB
1.0
1722 
2.0
932 
3.0
616 
4.0
 
107
5.0
 
8

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.01722
50.8%
2.0932
27.5%
3.0616
 
18.2%
4.0107
 
3.2%
5.08
 
0.2%
(Missing)7
 
0.2%

Length

2021-11-06T17:51:16.756573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-11-06T17:51:16.803576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
1.01722
50.9%
2.0932
27.5%
3.0616
 
18.2%
4.0107
 
3.2%
5.08
 
0.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-11-06T17:51:12.385303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.120689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.911719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.640718image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.462732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.205732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.948731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.616252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.470304image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.228689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.008719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.731719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.563731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.300731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.039745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.712252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.546303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.323692image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.097719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.816719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.660732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.386731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.121745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.798252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.625299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.424690image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.190719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.903732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.755732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.474731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.202745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.878252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.722299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.528690image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.286719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.094732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.857731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.576735image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.288745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.966252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.812303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.631719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.385719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.192732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.957731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.674731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.378745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.147840image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.888300image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.722723image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.474719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.286732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.040731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.766732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.457252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.223776image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.965299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:07.817719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:08.562719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:09.375732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.125731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:10.858731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:11.538252image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-11-06T17:51:12.311303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-11-06T17:51:16.881576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-11-06T17:51:17.062602image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-11-06T17:51:17.232602image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-11-06T17:51:17.400602image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-11-06T17:51:17.551649image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-11-06T17:51:13.140303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-11-06T17:51:13.535299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-11-06T17:51:13.690759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-11-06T17:51:13.776765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

ab_idbatter_ideventg_idinningop_scorep_throwspitcher_idstandtopdatecodetypepitch_typeevent_numb_scoreb_counts_countoutspitch_numon_1bon_2bon_3bpcodes
02015000768571976Hit By Pitch201500012100L477132RTrue2015-04-06FSFF30.00.00.00.01.00.00.00.01.0
12015000768571976Hit By Pitch201500012100L477132RTrue2015-04-06CSFF40.00.01.00.02.00.00.00.01.0
22015000768571976Hit By Pitch201500012100L477132RTrue2015-04-06HBFF50.00.02.00.03.00.00.00.01.0
32015000769519083Strikeout201500012110L477132RTrue2015-04-06BBFF80.00.00.00.01.01.00.00.01.0
42015000769519083Strikeout201500012110L477132RTrue2015-04-06CSFF90.01.00.00.02.01.00.00.01.0
52015000769519083Strikeout201500012110L477132RTrue2015-04-06CSCU100.01.01.00.03.01.00.00.03.0
62015000769519083Strikeout201500012110L477132RTrue2015-04-06BBFF110.01.02.00.04.01.00.00.01.0
72015000769519083Strikeout201500012110L477132RTrue2015-04-06SSCU120.02.02.00.05.01.00.00.03.0
82015000770461314Single201500012110L477132RTrue2015-04-06BBFF160.00.00.01.01.01.00.00.01.0
92015000770461314Single201500012110L477132RTrue2015-04-06SSSL170.01.00.01.02.01.00.00.02.0

Last rows

ab_idbatter_ideventg_idinningop_scorep_throwspitcher_idstandtopdatecodetypepitch_typeevent_numb_scoreb_counts_countoutspitch_numon_1bon_2bon_3bpcodes
33822015183799500208Strikeout201502425422L477132RTrue2015-10-04FSFF1980.01.02.01.06.00.00.00.01.0
33832015183799500208Strikeout201502425422L477132RTrue2015-10-04FSCU1990.01.02.01.07.00.00.00.03.0
33842015183799500208Strikeout201502425422L477132RTrue2015-10-04FSSL2000.01.02.01.08.00.00.00.02.0
33852015183799500208Strikeout201502425422L477132RTrue2015-10-04BBSL2010.01.02.01.09.00.00.00.02.0
33862015183799500208Strikeout201502425422L477132RTrue2015-10-04SSSL2020.02.02.01.010.00.00.00.02.0
33872015183800576397Single201502425422L477132RTrue2015-10-04BBCU2060.00.00.02.01.00.00.00.03.0
33882015183800576397Single201502425422L477132RTrue2015-10-04TSSL2070.01.00.02.02.00.00.00.02.0
33892015183800576397Single201502425422L477132RTrue2015-10-04BBFF2080.01.01.02.03.00.00.00.01.0
33902015183800576397Single201502425422L477132RTrue2015-10-04BBFF2090.02.01.02.04.00.00.00.01.0
33912015183800576397Single201502425422L477132RTrue2015-10-04DXFF2100.03.01.02.05.00.00.00.01.0